14.1 Introduction to Naïve Bayes
-
Understand conditional probabilities and the Naïve Bayes Theorem.
-
Be able to use training data to calculate conditional probabilities.
Understand the Bag of Words method to create training data probabilities.
-
Be able to classify documents using training data and Naïve Bayes Classifier.
-
Be able to predict the classification of categorical data using Naïve Bayes Classifier.
The most common prediction problem data analysts are faced with is binary classification. Does the patient have heart disease or not? Is the email spam or not? Is the potential customer likely to buy or not? Is the insurance claim fraud or not? Binary classification problems are Yes/No or 1/0. There are many methods to do binary classification, including logistic regression, decision trees, and Naïve Bayes.
In addition to binary classification, there are also situations where classification is required among a choice of multiple options. The Naïve Bayes classifier can also be used for this type of problem. In this lesson, we will learn about the Naïve Bayes classification method, both for binary classification and for classification with more than two options.
The chapter first introduces the concepts of conditional probabilities and how to use the Naive Bayes Classifier to solve a prediction problem. It then explains the Naive Bayes Classifier and provides a simple lexical example to classify email between normal and spam.
The next section contains an example that expands the simple binary test that was used to explain Naïve Bayes to a multivalued categorization. In that example, we consider how to categorize different types of flowers based on several characteristics. A final example goes through a detailed lexical example to build Bags of Words and create the conditional probabilities from training data. Finally, the training data is used to determine the class of text documents. Even though the examples in this chapter are lexical examples, the Naive Bayes Classifier works for many different categories. Non-lexical examples require other methods to determine conditional probabilities.